Formale Grundlagen der Fehlertoleranz in verteilten Systemen

نویسنده

Felix C. Freiling

چکیده

A system is fault-tolerant if it maintains some form of correctness in the presence of faults. Fault-tolerant systems are necessary in many application areas of computing, especially where the failure of a computer system may lead to considerable damage. However, the development of fault-tolerant systems is a difficult task and requires rigorous and scientifically sound engineering methodologies. Amoung other topics, these methodologies must cover (1) system and fault modelling, (2) design patterns for fault-tolerance mechanisms and (3) basic building blocks for fault-tolerant algorithms. This thesis contributes to research in all of these three areas. Questions of system and fault modelling are treated in chapter 2 in which the four most prominent system models for fault-tolerant distributed systems are presented and compared: the model of partial synchrony by Dwork, Lynch and Stockmeyer, the failure detector model of Chandra and Toueg, the timed asynchronous system model of Cristian and Fetzer, and the quasi-synchronous model by Almeida, Veŕıssimo and Casimiro. Chapter 3 focusses on the underlying principles of fault-tolerant systems. Starting point is the fault-tolerance theory of Arora and Kulkarni. In this theory, a fault-tolerant program is regarded as the composition of a fault-intolerant program and fault-tolerance components. We analyse the assumptions of the theory and extend it by defining formal notions of redundancy. Two forms of redundancy are identified (redundancy in space and redundancy in time) and we study which forms of redundancy are necessary to maintain which properties in the presence of faults. In chapter 4 we develop algorithmical building blocks for observation in faulty environments, a central problem in fault-tolerant computing which has not enjoyed a lot of research attention yet. Observation here means detecting whether or not a boolean predicate on global states holds during the execution of a distributed system. In the asynchronous system model with crash failures, we introduce two new observation modalities called negotiably and discernibly (which roughly correspond to the well-known modalities possibly and definitely by Cooper, Marzullo and Neiger) and present detection algorithms for them under increasingly weak fault assumptions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gliederung und Systematisierung von Schutzzielen in IT-Systemen

Seit vielen Jahrzehnten wird über zur Sicherheit und Zuverlässigkeit von IT-Systemen nachgedacht, geforscht und an Lösungen entwickelt. Dabei stiegen die Anforderungen an sichere Systeme mit der technischen Leistungsfähigkeit und den Anwendungsbereichen der Systeme. Im klassischen zentralisierten Rechenzentrum spielten und spielen Fragen der Verfügbarkeit, Ausfallsicherheit, und Fehlertoleranz ...

متن کامل

Anfragebearbeitung und Routing in Schema-basierten P2P-Systemen

Zusammenfassung Im Zusammenhang mit Filesharing-Anwendungen und skalierbaren verteilten Datenstrukturen hat sich das Peer-to-Peer (P2P) Paradigma in jüngster Zeit immer stärker verbreitet. Aufgrund ihres dezentralen Charakters versprechen P2P-Systeme erhöhte Robustheit und Skalierbarkeit und eröffnen dadurch neue Möglichkeiten für Datenintegrationsanwendungen. In solchen Schema-basierten P2P-Sy...

متن کامل

HADES - Ein hochverfügbares verteiltes Main-Memory DBMS für eventbasierte Systeme

Dieser Beitrag beschreibt, wie durch den Einsatz von Fehlertoleranz Festplatten durch eine schnellere aber fehleranfälligere Technologie ersetzt werden können, um die Geschwindigkeit von Datenbanken in eventbasierten Systemen zu steigern. 1 Einführung und Grundlagen Während sich in den letzten Jahren die Speicherkapazität dramatisch erhöht hat, konnte die Zugriffszeit von Festplatten kaum verbe...

متن کامل

Replikation in Peer-to-Peer Systemen

Angesichts der zunehmenden Größe heutiger verteilter Systeme erweist sich das P2PParadigma als eine vielversprechende Alternative zu traditionellen Client/Server-Architekturen. Zur Verbesserung von Performance, Verfügbarkeit und Zuverlässigkeit werden die Daten in solchen Systemen auf eine Vielzahl von Peers repliziert. Das mit redundanter Datenhaltung einhergehende Problem der Konsistenzsicher...

متن کامل

Peer-to-Peer: Grundlagen und Architektur

Sind Peer-to-Peer-(P2P)-Systeme ein neuer Ansatz zum Design von verteilten Anwendungen? Eigentlich nicht. Zwar sind P2P-Systeme ein großer Erfolg, aber überspitzt formuliert, stellen sie „nur“ eine Rückbesinnung auf das ursprüngliche Internet-Prinzip dar: Jeder Internet-Node kann mit jedem anderen Internet-Node bidirektional kommunizieren, zum Beispiel, sowohl als FTP-Client als auch als FTP-Se...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Formale Grundlagen der Fehlertoleranz in verteilten Systemen

نویسنده

چکیده

منابع مشابه

Gliederung und Systematisierung von Schutzzielen in IT-Systemen

Anfragebearbeitung und Routing in Schema-basierten P2P-Systemen

HADES - Ein hochverfügbares verteiltes Main-Memory DBMS für eventbasierte Systeme

Replikation in Peer-to-Peer Systemen

Peer-to-Peer: Grundlagen und Architektur

عنوان ژورنال:

اشتراک گذاری